Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR additional health status #2934

Merged
merged 3 commits into from
Sep 17, 2024
Merged

Conversation

End-rey
Copy link
Contributor

@End-rey End-rey commented Sep 11, 2024

Closes #2923.

I'm not sure I've found the right name for the status, and I was also thinking that I could add an additional status before updating contracts, since this is the longest process.
I made a metric type like the storage node (number from enum), but I also thought about the string parameter.

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

Attention: Patch coverage is 64.28571% with 5 lines in your changes missing coverage. Please review.

Project coverage is 23.91%. Comparing base (1a5809e) to head (5f97a86).
Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
pkg/innerring/state.go 0.00% 2 Missing ⚠️
pkg/metrics/innerring.go 81.81% 2 Missing ⚠️
pkg/innerring/innerring.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2934      +/-   ##
==========================================
+ Coverage   23.88%   23.91%   +0.03%     
==========================================
  Files         775      776       +1     
  Lines       45610    45721     +111     
==========================================
+ Hits        10892    10936      +44     
- Misses      33861    33925      +64     
- Partials      857      860       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


// IR application is started and serves all services.
READY = 2;
READY = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you have broken the older version of cli that expect 2 as READY:

if healthStatus != control.HealthStatus_READY {
os.Exit(1)
}

it is internal API but still i think it is not worth it, @roman-khimov

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, FIY, if we do such things we usually add something to our CHANGELOG to prepare our users for their catastrophic update, e.g.:

neofs-node/CHANGELOG.md

Lines 154 to 170 in 4a1bc79

### Updating from v0.40.1
Remove `notification` section from all SN configuration files: it is no longer
supported. All NATS servers running for this purpose only are no longer needed.
If your app depends on notifications transmitted to NATS, do not update and
create an issue please.
Stop attaching `__NEOFS__NETMAP*` X-headers to NeoFS API requests. If your app
is somehow tied to them, do not update and create an issue please.
Notice that this is the last release containing `blobovnicza-to-peapod`
migration utility. Blobovniczas were removed from the node since 0.39.0, so
if you're using any current NeoFS node version it's not a problem. If you're
using 0.38.0 or lower with blobovniczas configured, please migrate ASAP.
Remove `grpc.tls.use_insecure_crypto` from any storage node configuration.
Remove `timers.emit` from any inner ring configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility better be kept, it doesn't cost a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new statuses after the existing ones, therefore, it no longer breaks.

func New(ctx context.Context, log *zap.Logger, cfg *viper.Viper, errChan chan<- error) (*Server, error) {
var err error
server := &Server{log: log}

server.setHealthStatus(control.HealthStatus_HEALTH_STATUS_UNDEFINED)
server.setHealthStatus(control.HealthStatus_CREATE_SERVER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think HealthStatus_CREATE_SERVER was not a problem at all. you started a server, it only created &Server{log: log}, so it is really undefined now. i guess the initial issue was mostly about extending codes: you set it like "STARTING_BLOCKCHAIN" and then call server.bc.Run(ctx); you set it like "DEPLOYING_NETWORK" and then you call deploy.Deploy(ctx, deployPrm), etc. it allows understand more about what is happening, starting IR may take minutes now, and all an admin got is minutes of "undefined" that goes immediately to READY then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rule of thumb may be: any operation that is not about local CPU work but about any I/O that potentially may block and that an admin should know about

Copy link
Contributor Author

@End-rey End-rey Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new statuses that seemed to work with I/O and can block func, but I'm not sure about them.

CHANGELOG.md Outdated
@@ -5,6 +5,8 @@ Changelog for NeoFS Node

### Added
- More effective FSTree writer for HDDs, new configuration options for it (#2814)
- New health statuses in inner ring (#2934)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add this record with the corresponding commit

Signed-off-by: Andrey Butusov <[email protected]>
New health status for time-spending process - initializing Neo network by
blockchain: `INITIALIZING_NETWORK`.

Closes #2923.

Signed-off-by: Andrey Butusov <[email protected]>
Expose health status of ir via Prometheus.

Signed-off-by: Andrey Butusov <[email protected]>
Copy link
Contributor

@cthulhu-rider cthulhu-rider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk why lint action failed

@roman-khimov
Copy link
Member

idk why lint action failed

Looks a lot like nspcc-dev/neo-go#3416, to be fixed with #2940.

@roman-khimov roman-khimov merged commit 4f22042 into master Sep 17, 2024
20 of 21 checks passed
@roman-khimov roman-khimov deleted the 2923-health_status_undefined branch September 17, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Health status: HEALTH_STATUS_UNDEFINED
4 participants